--------------Lode Runner--------------
A 4am crack                  2018-10-20
-------------------. updated 2020-06-24
                   |___________________

Name: Lode Runner
Genre: arcade
Year: 1983
Credits: Doug Smith
Publisher: Broderbund Software
Platform: Apple ][+ or later
Media: 5.25-inch disk
Sides: 1
OS: custom
Similar cracks:
  #280 Championship Lode Runner

                   ~

               Chapter 0
 In Which Various Automated Tools Fail
          In Interesting Ways


COPYA
  immediate disk read error

Locksmith Fast Disk Backup
  unable to read any track

EDD 4 bit copy (no sync, no count)
  read errors on tracks $0E, $11, $14,
  $17, $1A, $1D, $20, and $22
  copy just hangs on boot

Copy ][+ nibble editor
  T03-T0C appear to be mostly normal
    with modified address epilogue
  Some higher tracks appear to have
    4-4 encoded nibbles

Disk Fixer
  ["O" -> "Input/Output Control"]
    set CHECKSUM ENABLED=NO
  T03-T0C readable

Why didn't COPYA work?
  so many reasons

Why didn't Locksmith FDB work?
  modified epilogues on some tracks,
  non-sector-based data on others

Why didn't my EDD copy work?
  I don't know. Looking through old
  bit copy parameter files, it appears
  that on tracks $0D and higher, data
  is stored on quarter tracks every 1.5
  tracks ($0D.25, $0E.75, &c.) but I
  was unable to use these parameters to
  create a working protected backup of
  my original disk.

This is going to be one of those
"capture the game in memory and rebuild
it from the ground up" cracks.

Next steps:

  1. Trace bootloader
  2. Capture game code in memory
  3. Write game to a standard disk and
     build a bootloader to load it
  4. Declare victory (*)

(*) go to the gym

                   ~

               Chapter 1
      In Which We Brag About Our
           Humble Beginnings


I have two floppy drives, one in slot 6
and the other in slot 5. My "work disk"
(in slot 5) runs Diversi-DOS 64K, which
is compatible with Apple DOS 3.3 but
relocates most of DOS to the language
card on boot. This frees up most of
main memory (only using a single page
at $BF00..$BFFF), which is useful for
loading large files or examining code
that lives in areas typically reserved
for DOS.

[S6,D1=original disk]
[S5,D1=my work disk]

The floppy drive firmware code at $C600
is responsible for aligning the drive
head and reading sector 0 of track 0
into main memory at $0800. Because the
drive can be connected to any slot, the
firmware code can't assume it's loaded
at $C600. If the floppy drive card were
removed from slot 6 and reinstalled in
slot 5, the firmware code would load at
$C500 instead.

To accommodate this, the firmware does
some fancy stack manipulation to detect
where it is in memory (which is a neat
trick, since the 6502 program counter
is not generally accessible). However,
due to space constraints, the detection
code only cares about the lower 4 bits
of the high byte of its own address.

Stay with me, this is all about to come
together and go boom.

$C600 (or $C500, or anywhere in $Cx00)
is read-only memory. I can't change it,
which means I can't stop it from
transferring control to the boot sector
of the disk once it's in memory. BUT!
The disk firmware code works unmodified
at any address. Any address that ends
with $x600 will boot slot 6, including
$B600, $A600, $9600, &c.

; copy drive firmware to $9600
*9600<C600.C6FFM

; and execute it
*9600G
...reboots slot 6, loads game...

Now then:

]PR#5
...
]CALL -151

*9600<C600.C6FFM

*96F8L

96F8-   4C 01 08    JMP   $0801

That's where the disk controller ROM
code ends and the on-disk code begins.
But $9600 is part of read/write memory.
I can change it at will. So I can
interrupt the boot process after the
drive firmware loads the boot sector
from the disk but before it transfers
control to the disk's bootloader.

; instead of jumping to on-disk code,
; copy boot sector to higher memory so
; it survives a reboot
96F8-   A0 00       LDY   #$00
96FA-   B9 00 08    LDA   $0800,Y
96FD-   99 00 28    STA   $2800,Y
9700-   C8          INY
9701-   D0 F7       BNE   $96FA

; turn off slot 6 drive motor
9703-   AD E8 C0    LDA   $C0E8

; reboot to my work disk in slot 5
9706-   4C 00 C5    JMP   $C500

*BSAVE TRACE,A$9600,L$109
*9600G
...reboots slot 6...
...reboots slot 5...

]BSAVE OBJ.0800-08FF,A$2800,L$100

Now we get to(*) trace the boot process
one sector, one page, one instruction
at a time.

(*) If you replace the words "need to"
    with the words "get to," life
    becomes amazing.

                   ~

               Chapter 2
    In Which It Is Not At All Clear
            What's Going On


]CALL -151

; move boot sector code back into place
*800<2800.28FFM

*801L

; clear hi-res graphics screens (both)
0801-   A0 00       LDY   #$00
0803-   A9 20       LDA   #$20
0805-   A2 40       LDX   #$40
0807-   84 00       STY   $00
0809-   85 01       STA   $01
080B-   98          TYA
080C-   91 00       STA   ($00),Y
080E-   C8          INY
080F-   D0 FB       BNE   $080C
0811-   E6 01       INC   $01
0813-   CA          DEX
0814-   D0 F6       BNE   $080C

; show hi-res graphics screen 1
0816-   2C 52 C0    BIT   $C052
0819-   2C 57 C0    BIT   $C057
081C-   2C 54 C0    BIT   $C054
081F-   2C 50 C0    BIT   $C050

; save slot number (x16)
0822-   A6 2B       LDX   $2B
0824-   86 08       STX   $08

; decrypt rest of boot0 and store it in
; zero page (starting at $60)
0826-   EA          NOP
0827-   EA          NOP
0828-   A0 00       LDY   #$00
082A-   EA          NOP
082B-   EA          NOP
082C-   B9 50 08    LDA   $0850,Y
082F-   EA          NOP
0830-   EA          NOP
0831-   49 A5       EOR   #$A5
0833-   EA          NOP
0834-   EA          NOP
0835-   99 60 00    STA   $0060,Y
0838-   EA          NOP
0839-   EA          NOP
083A-   C8          INY
083B-   D0 EF       BNE   $082C
083D-   EA          NOP
083E-   EA          NOP

; reset stack pointer
083F-   A2 FF       LDX   #$FF
0841-   EA          NOP
0842-   EA          NOP
0843-   EA          NOP
0844-   9A          TXS
0845-   EA          NOP
0846-   EA          NOP

; and exit
0847-   60          RTS

Wait, what?

Here's what: we decrypted $B0 bytes and
stored them in zero page starting at
$60. But that means $10 bytes were also
stored in $0100..$010F. Then we reset
the stack pointer, then we "returned."
The stack pointer wrapped around to
$00, and whatever ended up at $0100
serves as a "return" address (minus 1,
as usual).

Let's find out what that is.

*9600<C600.C6FFM

; set up callback after decryption loop
96F8-   A9 4C       LDA   #$4C
96FA-   8D 45 08    STA   $0845
96FD-   A9 0A       LDA   #$0A
96FF-   8D 46 08    STA   $0846
9702-   A9 97       LDA   #$97
9704-   8D 47 08    STA   $0847

; start the boot
9707-   4C 01 08    JMP   $0801

; callback is here -- copy decrypted
; code/data to graphics page so it
; survives a reboot
970A-   A0 00       LDY   #$00
970C-   B9 60 00    LDA   $0060,Y
970F-   99 60 20    STA   $2060,Y
9712-   C8          INY
9713-   D0 F7       BNE   $970C

; turn off the slot 6 drive motor
9715-   AD E8 C0    LDA   $C0E8

; reboot to my work disk
9718-   4C 00 C5    JMP   $C500

*BSAVE TRACE2,A$9600,L$11B
*9600G
...reboots slot 6...
...reboots slot 5...
]BSAVE OBJ.0060-015F,A$2060,L$100
]CALL -151

*2100.210F

2100- B3 00 5F 00 FF 03 00 04
2108- FF 8B FE 07 FF 03 FF 5F

This is what ends up at $0100. The
first two bytes are $B3/$00, so
execution continues at $00B4. At the
next RTS, it will jump to $0060, then
$0400, then $0401, &c.

So what's at $00B4? That's in memory at
$20B4.

*20B4L

; no idea what this is doing, but I'm
; sure it'll become clear soon enough
20B4-   A2 D4       LDX   #$D4
20B6-   86 00       STX   $00
20B8-   E8          INX
20B9-   86 01       STX   $01
20BB-   E8          INX
20BC-   86 02       STX   $02
20BE-   E8          INX
20BF-   86 03       STX   $03
20C1-   A9 04       LDA   #$04
20C3-   AA          TAX
20C4-   60          RTS

Another RTS. Now we jump to $0060,
which is in memory at $2060.

*2060L

2060-   86 3E       STX   $3E
2062-   85 3A       STA   $3A
2064-   A6 3E       LDX   $3E
2066-   86 40       STX   $40
2068-   A0 00       LDY   #$00
206A-   A5 3A       LDA   $3A
206C-   84 3C       STY   $3C
206E-   85 3D       STA   $3D

X and A came in with $04, and Y ended
up at $00 after the decryption loop at
$082C, so the zero page end up as

  $00 = $D4
  $01 = $D5
  $02 = $D6
  $03 = $D7
  ...
  $08 = slot number x16 (e.g. $60)
  ...
  $3A = $04
  ...
  $3C = $00
  $3D = $04
  $3E = $04
  ...
  $40 = $04

; slot number x16
2070-   A6 08       LDX   $08

; subroutine just reads a nibble
2072-   20 AE 00    JSR   $00AE

; Ah, that zero page initialization at
; $00B4 makes sense now. Those values
; constitute a custom prologue to read
; the rest of track $00: "D4 D5 D6"
2075-   C5 00       CMP   $00
2077-   D0 F9       BNE   $2072
2079-   20 AE 00    JSR   $00AE
207C-   C5 01       CMP   $01
207E-   D0 F5       BNE   $2075
2080-   20 AE 00    JSR   $00AE
2083-   C5 02       CMP   $02
2085-   D0 F5       BNE   $207C

; decode 4-4 encoded sector data
2087-   BD 8C C0    LDA   $C08C,X
208A-   10 FB       BPL   $2087
208C-   2A          ROL
208D-   85 3F       STA   $3F
208F-   BD 8C C0    LDA   $C08C,X
2092-   10 FB       BPL   $208F
2094-   25 3F       AND   $3F

; store in $0400 (text page)
2096-   91 3C       STA   ($3C),Y
2098-   C8          INY
2099-   D0 EC       BNE   $2087
209B-   0E 00 C0    ASL   $C000

; and a one-nibble prologue
209E-   BD 8C C0    LDA   $C08C,X
20A1-   10 FB       BPL   $209E
20A3-   C5 03       CMP   $03
20A5-   D0 BD       BNE   $2064

; increment page
20A7-   E6 3D       INC   $3D

; decrement sector count
20A9-   C6 40       DEC   $40
20AB-   D0 DA       BNE   $2087
20AD-   60          RTS

So we're reading 4 sectors into $0400,
then "returning" again. According to
the stack, execution continues at
$0400, which I don't have yet.

*9600<C600.C6FFM

; set up callback #1
96F8-   A9 4C       LDA   #$4C
96FA-   8D 45 08    STA   $0845
96FD-   A9 0A       LDA   #$0A
96FF-   8D 46 08    STA   $0846
9702-   A9 97       LDA   #$97
9704-   8D 47 08    STA   $0847

; start the boot
9707-   4C 01 08    JMP   $0801

; (callback #1) set up callback #2
; after reading into text page by
; directly modifying the stack page
970A-   A9 14       LDA   #$14
970C-   8D 04 01    STA   $0104
970F-   A9 97       LDA   #$97
9711-   8D 05 01    STA   $0105

; "RTS" to continue the boot
9714-   60          RTS

; (callback #2) copy text page to
; graphics page so it survives a
; reboot
9715-   A2 04       LDX   #$04
9717-   A0 00       LDY   #$00
9719-   B9 00 04    LDA   $0400,Y
971C-   99 00 24    STA   $2400,Y
971F-   C8          INY
9720-   D0 F7       BNE   $9719
9722-   EE 1B 97    INC   $971B
9725-   EE 1E 97    INC   $971E
9728-   CA          DEX
9729-   D0 EE       BNE   $9719

; turn off slot 6 drive motor
972B-   AD E8 C0    LDA   $C0E8

; reboot to my work disk
972E-   4C 00 C5    JMP   $C500

*BSAVE TRACE3,A$9600,L$131
*9600G
...reboots slot 6...
...reboots slot 5...
]BSAVE OBJ.0400-07FF,A$2400,L$400

                   ~

               Chapter 3
     You Know What They Say About
      The Calm Before The Storm?
           This Is The Calm


The last instruction we executed was an
RTS (at $00AD). Once again, we look to
the stack to see where execution
continues, and it continues at $0400. I
have $0400..$07FF in memory at $2400.

Here we go.

]CALL -151

*2400L

2400-   60          RTS

I swear I am not making this up.

Looking to the stack once more, we see
that execution continues at $0401, in
memory at $2401.

2401-   EA          NOP
2402-   EA          NOP
2403-   20 E0 07    JSR   $07E0
2406-   EA          NOP
2407-   EA          NOP

*27E0L

; zap RAM bank 1 in the language card
27E0-   AD 81 C0    LDA   $C081
27E3-   AD 81 C0    LDA   $C081
27E6-   A0 00       LDY   #$00
27E8-   A9 D0       LDA   #$D0
27EA-   84 00       STY   $00
27EC-   85 01       STA   $01
27EE-   B1 00       LDA   ($00),Y
27F0-   91 00       STA   ($00),Y
27F2-   C8          INY
27F3-   D0 F9       BNE   $27EE
27F5-   E6 01       INC   $01
27F7-   D0 F5       BNE   $27EE
27F9-   AD 80 C0    LDA   $C080
27FC-   60          RTS

*2408L

; read/write RAM bank 2
2408-   AD 83 C0    LDA   $C083
240B-   AD 83 C0    LDA   $C083

; move some code to $0200
240E-   A0 00       LDY   #$00
2410-   B9 00 07    LDA   $0700,Y
2413-   99 00 02    STA   $0200,Y
2416-   C8          INY
2417-   D0 F7       BNE   $2410

*2700L

; standard Broderbund Badlands, starts
; by writing a single character to the
; upper-left corner of the screen
2700-   A9 D2       LDA   #$D2
2702-   2C A9 D0    BIT   $D0A9
2705-   2C A9 CC    BIT   $CCA9
2708-   2C A9 A1    BIT   $A1A9
270B-   48          PHA
270C-   20 E0 02    JSR   $02E0
270F-   20 2F FB    JSR   $FB2F
2712-   20 58 FC    JSR   $FC58
2715-   20 84 FE    JSR   $FE84
2718-   68          PLA
2719-   8D 00 04    STA   $0400

; wipe main memory
271C-   A0 00       LDY   #$00
271E-   98          TYA
271F-   99 00 BF    STA   $BF00,Y
2722-   C8          INY
2723-   D0 FA       BNE   $271F
2725-   CE 21 02    DEC   $0221

; play a sound
2728-   AD 21 02    LDA   $0221
272B-   AA          TAX
272C-   2C 30 C0    BIT   $C030
272F-   EA          NOP
2730-   EA          NOP
2731-   EA          NOP
2732-   C9 08       CMP   #$08
2734-   B0 E6       BCS   $271C

; munge the reset vector
2736-   8D F3 03    STA   $03F3
2739-   8D F4 03    STA   $03F4

; reboot from whence we came
273C-   AD FF 02    LDA   $02FF
273F-   4A          LSR
2740-   4A          LSR
2741-   4A          LSR
2742-   4A          LSR
2743-   09 C0       ORA   #$C0
2745-   E9 00       SBC   #$00
2747-   48          PHA
2748-   A9 FF       LDA   #$FF
274A-   48          PHA
274B-   60          RTS

So, uh, let's try not to end up there.

*2419L

; set reset vector to The Badlands
2419-   A9 02       LDA   #$02
241B-   8C FC FF    STY   $FFFC
241E-   8D FD FF    STA   $FFFD
2421-   8C F2 03    STY   $03F2
2424-   8D F3 03    STA   $03F3
2427-   49 A5       EOR   #$A5
2429-   8D F4 03    STA   $03F4

; also set input and output vectors to
; The Badlands
242C-   A0 03       LDY   #$03
242E-   A9 02       LDA   #$02
2430-   84 36       STY   $36
2432-   85 37       STA   $37
2434-   84 38       STY   $38
2436-   85 39       STA   $39

; also the BRK vector
2438-   8C F0 03    STY   $03F0
243B-   8D F1 03    STA   $03F1
243E-   A9 00       LDA   #$00
2440-   85 0A       STA   $0A

; save boot slot (x16) again
2442-   A6 2B       LDX   $2B
2444-   8E FF 02    STX   $02FF

2447-   EA          NOP
2448-   EA          NOP
2449-   EA          NOP
244A-   A9 1A       LDA   #$1A
244C-   20 03 06    JSR   $0603

*2603L

2603-   4C 5D 06    JMP   $065D

*265DL
...

Oh, it's a standard track seek routine.
The accumulator is the phase (track x2)
and it returns once it seeks to that
track.

So we're on track $0D.

                   ~

               Chapter 4
         And This Is The Storm


Continuing from $044F...

*244FL

244F-   EA          NOP
2450-   EA          NOP
2451-   20 50 05    JSR   $0550

*2550L

; a loop, of sorts
2550-   A0 00       LDY   #$00
2552-   84 05       STY   $05
2554-   84 06       STY   $06
2556-   B9 70 05    LDA   $0570,Y
2559-   F0 09       BEQ   $2564
255B-   20 C0 05    JSR   $05C0
255E-   A4 06       LDY   $06
2560-   C8          INY
2561-   D0 F1       BNE   $2554
2563-   00          BRK
2564-   60          RTS

*2570.

2570- 0F 17 60 68 70 78 80 88
2578- 98 A0 A8 B0 B8 00 .. ..

Those look like addresses, or at least
the high byte of addresses, in main
memory. If $05C0 loads 8 pages at a
time, we'll read into $0F00..$1EFF and
$6000..$BFFF.

The loop terminates after we find the
$00 byte (at $057D, when Y = $0D) and
branch to $0564 (from $0559). If for
some reason we never get a $00 byte, we
eventually hit a BRK at $0563, which is
a very 80's way of saying
"ASSERT(THIS_SHOULD_NEVER_HAPPEN)".

Let's see what's at $05C0.

*25C0L

25C0-   A0 04       LDY   #$04
25C2-   84 04       STY   $04

; push the address high byte to the
; stack
25C4-   48          PHA
25C5-   A2 01       LDX   #$01
25C7-   20 80 05    JSR   $0580

*2580L

; push the address high byte to the
; stack... again?
2580-   48          PHA
2581-   A5 05       LDA   $05
2583-   29 07       AND   #$07
2585-   A8          TAY
2586-   B9 F8 06    LDA   $06F8,Y

$06F8 looks like this:

*26F8.

26F8- 96 97 9A 9B 9D 9E 9F CB

; Ah, and we're storing that in zero
; page, then munging it and storing
; it again. Earlier these were used
; as the prologue in the RWTS routine.
2589-   85 00       STA   $00
258B-   A5 05       LDA   $05
258D-   4A          LSR
258E-   09 AA       ORA   #$AA
2590-   85 01       STA   $01
2592-   A5 05       LDA   $05
2594-   09 AA       ORA   #$AA
2596-   85 02       STA   $02

; pop the address high byte
2598-   68          PLA
2599-   E6 05       INC   $05
259B-   A2 01       LDX   #$01

; and continue elsewhere
259D-   4C 00 06    JMP   $0600

*2600L

2600-   4C 09 06    JMP   $0609

*2609L

2609-   86 3E       STX   $3E
260B-   85 3A       STA   $3A
260D-   A6 3E       LDX   $3E
260F-   86 40       STX   $40
2611-   A0 00       LDY   #$00
2613-   A5 3A       LDA   $3A

; ($3C) now points to the target page
; in memory (low byte is #$00, high
; byte is the one we got from the array
; at $0570 and kept pushing and popping
; on the stack)
2615-   84 3C       STY   $3C
2617-   85 3D       STA   $3D

; $08 contains the boot slot (x16)
2619-   A6 08       LDX   $08

; subroutine (not shown) reads 1 nibble
; with a standard "LDA / BPL" loop
261B-   20 57 06    JSR   $0657

; match the 3-nibble prologue stored in
; zero page $00/$01/$02
261E-   C5 00       CMP   $00
2620-   D0 F9       BNE   $261B
2622-   20 57 06    JSR   $0657
2625-   C5 01       CMP   $01
2627-   D0 F5       BNE   $261E
2629-   20 57 06    JSR   $0657
262C-   C5 02       CMP   $02
262E-   D0 F5       BNE   $2625

; read 4-4 encoded data
2630-   BD 8C C0    LDA   $C08C,X
2633-   10 FB       BPL   $2630
2635-   2A          ROL
2636-   85 3F       STA   $3F
2638-   BD 8C C0    LDA   $C08C,X
263B-   10 FB       BPL   $2638
263D-   25 3F       AND   $3F

; store it in memory
263F-   91 3C       STA   ($3C),Y
2641-   C8          INY
2642-   D0 EC       BNE   $2630
2644-   0E 00 C0    ASL   $C000

; verify a 1-nibble epilogue
2647-   BD 8C C0    LDA   $C08C,X
264A-   10 FB       BPL   $2647
264C-   C5 03       CMP   $03
264E-   D0 BD       BNE   $260D

; increment the target page
2650-   E6 3D       INC   $3D

; decrement the sector count
2652-   C6 40       DEC   $40

; loop until done
2654-   D0 DA       BNE   $2630
2656-   60          RTS

The sector count is in zero page $40,
which is set at $060F from zero page
$3E, which is set at $0609 from the X
register, which is set at $059B to...
#$01. We are reading 1 sector.

Continuing from $05CA...

; $0A is the current phase (track x2),
; initialized at $0440 and updated in
; the track seek routine at $065D. It
; looks like we're seeking to the next
; half track (track $0D.5).
25CA-   A5 0A       LDA   $0A
25CC-   18          CLC
25CD-   69 01       ADC   #$01
25CF-   20 06 06    JSR   $0606

*2606L

2606-   4C D6 06    JMP   $06D6

*26D6L

; But wait! We're not calling the track
; seek routine directly. First we're
; fiddling with something inside that
; routine, at $06B4, then seeking, then
; fiddling with $06B4 again.
26D6-   A2 0D       LDX   #$0D
26D8-   8E B4 06    STX   $06B4
26DB-   20 5D 06    JSR   $065D
26DE-   A9 13       LDA   #$13
26E0-   8D B4 06    STA   $06B4
26E3-   60          RTS

What's going on?

Here's the byte we're fiddling with:

26B3-   A2 13       LDX   #$13    <-- !
26B5-   CA          DEX
26B6-   D0 FD       BNE   $26B5
26B8-   38          SEC
26B9-   E9 01       SBC   #$01
26BB-   D0 F6       BNE   $26B3
26BD-   60          RTS

It's part of a standard wait routine
(identical to the one used in DOS 3.3)
that is used to wait an exact number of
CPU cycles between hitting stepper
motors. This is extremely low level
stuff inside the subroutine that's
inside the subroutine that's inside the
RWTS that you

Absolutely. Should. Not. Fiddle. With.

So what does it even mean to change the
timing of this routine? It means that,
after we hit the stepper motor to tell
the drive head to start moving in one
direction, we're not waiting long
enough for it to get to the next
"stopping point," which is the next
full phase. We were on track $0D, so
seeking one phase forward should get us
to track $0D.5. But with the reduced
timing in this low-level wait routine,
we don't get all the way to $0D.5. We
stop somewhere around $0D.25.

We're stepping by quarter tracks.

Continuing from $05D2...

; increment the target page (on the top
; of the stack at this point) and push
; it back to the stack
25D2-   68          PLA
25D3-   18          CLC
25D4-   69 01       ADC   #$01
25D6-   48          PHA

; read 1 more sector
25D7-   20 80 05    JSR   $0580

; seek backwards, but again with the
; reduced timing, so we'll end up back
; on track $0D(.0)
25DA-   A5 0A       LDA   $0A
25DC-   38          SEC
25DD-   E9 01       SBC   #$01
25DF-   20 06 06    JSR   $0606

; do this 4 more times (zero page $04
; was initialized to #$04 at $05C2),
; for a total of 8 sectors
25E2-   68          PLA
25E3-   18          CLC
25E4-   69 01       ADC   #$01
25E6-   C6 04       DEC   $04
25E8-   D0 DA       BNE   $25C4

; seek forward 3 phases (1.5 tracks),
; with normal timing, so we end up on
; track $1E.5 and return to the main
; loop to do it all over again
25EA-   48          PHA
25EB-   A5 0A       LDA   $0A
25ED-   18          CLC
25EE-   69 03       ADC   #$03
25F0-   20 03 06    JSR   $0603
25F3-   68          PLA
25F4-   60          RTS

In case you got lost, this was the
"main loop" at $0550:

2550-   A0 00       LDY   #$00
2552-   84 05       STY   $05
2554-   84 06       STY   $06
2556-   B9 70 05    LDA   $0570,Y
2559-   F0 09       BEQ   $2564
255B-   20 C0 05    JSR   $05C0
255E-   A4 06       LDY   $06
2560-   C8          INY
2561-   D0 F1       BNE   $2554
2563-   00          BRK
2564-   60          RTS

As we suspected, the subroutine at
$05C0 does read 8 sectors at a time.
What we didn't know until now is that
it reads them in a zig-zag pattern from
adjacent quarter tracks, like this:

1C.75   1D.0    1D.25   1D.5    1D.75
--+-------+-------+-------+-------+----
  .      0F00     .       .       .
  .       .   \   .       .       .
  .       .      1000     .       .
  .       .   /   .       .       .
  .      1100     .       .       .
  .       .   \   .       .       .
  .       .      1200     .       .
  .      1300     .       .       .
  .       .   \   .       .       .
  .       .      1400     .       .
  .       .   /   .       .       .
  .      1500     .       .       .
  .       .   \   .       .       .
  .       .      1600     .       .

This explains the little "fluttering"
noise the original disk makes while
booting. It also explains why the game
loads so slowly, even though the code
itself is only $7000 bytes.

                   ~

               Chapter 5
       In Which We Triumphantly
         Fall Flat On Our Ass


When we return from the "main loop" at
$0550, we've read the entire game into
memory. There's more code after that,
but not much.

Continuing from $0454...

; reset the prologue and epilogue
; nibbles in zero page
2454-   EA          NOP
2455-   EA          NOP
2456-   A9 DD       LDA   #$DD
2458-   85 00       STA   $00
245A-   A9 F5       LDA   #$F5
245C-   85 01       STA   $01
245E-   A9 D5       LDA   #$D5
2460-   85 02       STA   $02
2462-   A9 D4       LDA   #$D4
2464-   85 03       STA   $03
2466-   EA          NOP
2467-   EA          NOP

; seek to track $21 (normal timing)
2468-   A9 42       LDA   #$42
246A-   20 03 06    JSR   $0603

; read 1 final sector
246D-   A9 04       LDA   #$04
246F-   A2 01       LDX   #$01
2471-   20 00 06    JSR   $0600

; jump to game
2474-   4C 41 85    JMP   $8541

That's it. We made it. We even kind of
understand it. Let's capture it.

]PR#5
]CALL -151

*9600<C600.C6FFM

[first part same as the previous trace]

; (callback #2) change final JMP at
; $0474 to break to the monitor
9715-   A9 59       LDA   #$59
9717-   8D 75 04    STA   $0475
971A-   A9 FF       LDA   #$FF
971C-   8D 76 04    STA   $0476

; "RTS" to continue the boot
971F-   60          RTS

*BSAVE TRACE4,A$9600,L$120
*9600G
...reboots slot 6...
...never breaks to monitor...
...continues to game...

I've missed something. There's f*ckery
afoot.

                   ~

               Chapter 6
             F*ckery Afoot


After much head-scratching, I finally
saw what I was missing. This code looks
innocuous enough:

; read 1 final sector
246D-   A9 04       LDA   #$04
246F-   A2 01       LDX   #$01
2471-   20 00 06    JSR   $0600

; jump to game
2474-   4C 41 85    JMP   $8541

But wait a minute. The read routine at
$0600 takes two parameters: the number
of sectors in X (1), and the target
address in A ($04, so $0400). But we're
executing from $0400! That means the
final sector read will overwrite this
code and return to the code it just
read from track $21. Which means...

THE "JMP $8541" IS FAKE.

To find the *real* entry point, we get
to change the target address at $046E
so it reads the sector into another
address, doesn't overwrite the calling
code, and returns to the same $0474 we
see in the listing (and which we can
trap).

]PR#5
]CALL -151

*9600<C600.C6FFM

; set up callback #1
96F8-   A9 4C       LDA   #$4C
96FA-   8D 45 08    STA   $0845
96FD-   A9 0A       LDA   #$0A
96FF-   8D 46 08    STA   $0846
9702-   A9 97       LDA   #$97
9704-   8D 47 08    STA   $0847

; start the boot
9707-   4C 01 08    JMP   $0801

; (callback #1) set up callback #2
; after reading into text page by
; directly modifying the stack page
970A-   A9 14       LDA   #$14
970C-   8D 04 01    STA   $0104
970F-   A9 97       LDA   #$97
9711-   8D 05 01    STA   $0105

; "RTS" to continue the boot
9714-   60          RTS

; (callback #2) change address of the
; final sector read to $2400 (unused)
; so execution returns to the following
; instruction
9715-   A9 24       LDA   #$24
9717-   8D 6E 04    STA   $046E

; change the final JMP to break to the
; monitor -- for real this time
971A-   A9 59       LDA   #$59
971C-   8D 75 04    STA   $0475
971F-   A9 FF       LDA   #$FF
9721-   8D 76 04    STA   $0476

; "RTS" to continue the boot
9724-   60          RTS

*BSAVE TRACE4,A$9600,L$125
*9600G
...reboots slot 6...
...breaks to monitor...

*4000<A000.BFFFM
*C500G
...

]BSAVE OBJ.0F00-1EFF,A$F00,L$1000
]BSAVE OBJ.6000-8FFF,A$6000,L$3000
]BSAVE OBJ.9800-9FFF,A$9800,L$800
]BSAVE OBJ.A000-BFFF,A$4000,L$2000
]BSAVE OBJ.0400-04FF,A$2400,L$100

]CALL -151

*2474L

2474-   BD 88 C0    LDA   $C088,X
2477-   4C 00 04    JMP   $0400

This is what the original disk executes
after reading the final sector from
track $21.

*2400L

; calculate a checksum of all the game
; code
2400-   A0 00       LDY   #$00
2402-   84 00       STY   $00
2404-   84 01       STY   $01
2406-   84 03       STY   $03
2408-   84 02       STY   $02
240A-   B9 70 05    LDA   $0570,Y
240D-   F0 21       BEQ   $2430
240F-   85 04       STA   $04
2411-   A2 08       LDX   #$08
2413-   A0 00       LDY   #$00
2415-   B1 03       LDA   ($03),Y
2417-   45 00       EOR   $00
2419-   85 00       STA   $00
241B-   B1 03       LDA   ($03),Y
241D-   18          CLC
241E-   65 01       ADC   $01
2420-   85 01       STA   $01
2422-   C8          INY
2423-   D0 F0       BNE   $2415
2425-   E6 04       INC   $04
2427-   CA          DEX
2428-   D0 EB       BNE   $2415
242A-   A4 02       LDY   $02
242C-   C8          INY
242D-   D0 D9       BNE   $2408
242F-   00          BRK

; if checksum verification fails, it's
; off to The Badlands with you
2430-   AD FE 04    LDA   $04FE
2433-   45 00       EOR   $00
2435-   F0 03       BEQ   $243A
2437-   4C 06 02    JMP   $0206
243A-   AD FF 04    LDA   $04FF
243D-   45 01       EOR   $01
243F-   D0 F6       BNE   $2437

; otherwise we set up... a DOS-shaped
; RWTS?!?!?
2441-   AE FF 02    LDX   $02FF
2444-   8E E9 B7    STX   $B7E9
2447-   8E F7 B7    STX   $B7F7

Yes, it turns out that the game loads a
mostly-normal DOS-shaped RWTS at $B800.
Presumably this is used for reading the
level data from tracks $03-$0C, plus
reading and writing user data disks.
(You did know that Lode Runner contains
a full level editor, right? Press
<Ctrl-E> at the title screen and you
can initialize your own data disk and
build and save your own levels.)

244A-   EA          NOP
244B-   20 80 04    JSR   $0480

*2480L

; tell the DOS-shaped RWTS that we're
; on track $0C
2480-   8A          TXA
2481-   4A          LSR
2482-   4A          LSR
2483-   4A          LSR
2484-   4A          LSR
2485-   AA          TAX
2486-   A9 18       LDA   #$18
2488-   9D 78 04    STA   $0478,X

; use the bootloader track seek routine
; to actually seek to track $0C (normal
; timing)
248B-   4C 03 06    JMP   $0603

This prevents the disk from grinding
when we switch over to the DOS-shaped
RWTS to read level 1.

; game-specific zero page setup, which
; I'm assuming is important
244E-   EA          NOP
244F-   EA          NOP
2450-   A9 06       LDA   #$06
2452-   85 8C       STA   $8C
2454-   A9 FF       LDA   #$FF
2456-   85 99       STA   $99
2458-   A9 CA       LDA   #$CA
245A-   85 95       STA   $95
245C-   A9 4C       LDA   #$4C
245E-   85 23       STA   $23
2460-   A9 50       LDA   #$50
2462-   85 36       STA   $36
2464-   A9 8E       LDA   #$8E
2466-   85 37       STA   $37

; ha ha, Roland loves using the I/O
; vector to point to the RWTS entry
; point, so he can "print" a character
; to read a sector
2468-   A9 B5       LDA   #$B5
246A-   85 38       STA   $38
246C-   A9 B7       LDA   #$B7
246E-   85 39       STA   $39

; this is the *actual* game entry point
2470-   4C 00 60    JMP   $6000

                   ~

               Chapter 7
      If You Wish To Play A Game,
  You Must First Create The Universe


So far we've focused on the game code
and ignored the level data. Now that we
have all the game code (finally!) we
can use Advanced Demuffin to capture
the level data on tracks $03-$0C.

The RWTS to read the level data is at
$B800, which I captured as earlier as
part of the "OBJ.A000-BFFF" file.

]BLOAD OBJ.A000-BFFF,A$4000
]BSAVE RWTS,A$5800,L$800

Now let's run Advanced Demuffin to copy
those tracks to a standard format.

]BRUN ADVANCED DEMUFFIN 1.5

[S6,D1=original disk]
[S6,D2=formatted blank disk]
[S5,D1=my work disk (still)]

["5" to switch to slot 5]

["R" to load a new RWTS module]
  --> At $B8, load "RWTS" from drive 1

["6" to switch to slot 6]

["C" to convert disk]

[press "Y" to change default values]

                 --v--

ADVANCED DEMUFFIN 1.5    (C) 1983, 2014
ORIGINAL BY THE STACK    UPDATES BY 4AM
=======================================


INPUT ALL VALUES IN HEX


SECTORS PER TRACK? (13/16) 16

START TRACK: $03        <-- change this
START SECTOR: $00
END TRACK: $0C          <-- change this
END SECTOR: $0F

INCREMENT: 1

MAX # OF RETRIES: 0

COPY FROM DRIVE 1
TO DRIVE: 2
=======================================
16SC $03,$00-$0C,$0F BY$01 S6,D1->S6,D2

                 --^--

And here we go...

                 --v--

ADVANCED DEMUFFIN 1.5    (C) 1983, 2014
ORIGINAL BY THE STACK    UPDATES BY 4AM
=======PRESS ANY KEY TO CONTINUE=======
TRK:   ..........
+.5:
    0123456789ABCDEF0123456789ABCDEF012
SC0:   ..........
SC1:   ..........
SC2:   ..........
SC3:   ..........
SC4:   ..........
SC5:   ..........
SC6:   ..........
SC7:   ..........
SC8:   ..........
SC9:   ..........
SCA:   ..........
SCB:   ..........
SCC:   ..........
SCD:   ..........
SCE:   ..........
SCF:   ..........
=======================================
16SC $03,$00-$0C,$0F BY$01 S6,D1->S6,D2

                 --^--

Then I took a bog standard RWTS from
a freshly initialized DOS 3.3 disk, to
replace the mostly-but-not-actually-
standard RWTS that the game loads at
$B800.

[S6,D1=DOS 3.3 master disk]
[S5,D1=my work disk]

]PR#6
...
]CALL -151
*2800<B800.BFFFM
*BSAVE DOS33 RWTS,A$2800,L$800,S5,D1

So here's the plan:

I have the level data (untouched) on
tracks $03-$0C. I'm going to write the
game code to tracks $0D-$13. Instead of
the original RWTS at $B800..$BFFF, I'll
substitute the standard DOS 3.3 RWTS
that I captured from the DOS 3.3 master
disk. Then I'll add a bootloader on
track $00 to load everything as quickly
as possible.

The disk map will look like this:

tr | notes
---+-----------------------------------
00 | custom bootloader (see below)
01 | --unused--
02 | --unused--
03 | .
04 | .
05 | .
06 | .
07 | . [level data from original disk]
08 | .
09 | .
0A | .
0B | .
0C | .
0D | $0F00..$1EFF
0E | $6000..$6FFF
0F | $7000..$7FFF
10 | $8000..$8FFF
11 | $9800..$9FFF, plus final $0400
12 | $A000..$AFFF
13 | $B000..$BFFF

The final $0400 sector (from track $21)
contains some final bits of game setup
which I can reuse. I can load it at
$9000 and call $9050 and it will set up
zero page and call the game entry point
at $6000. (I won't call the checksum
routine, nor will I call the track seek
routine -- although I will recreate the
part where it initializes the "what
track are we on" marker so the disk
doesn't grind when loading level 1.)

[S6,D1=demuffin'd disk with T03-T0C]
[S5,D1=my work disk]

]PR#5

First, I wrote a little write loop that
writes all the game code onto tracks
$0D-$13 (but, like, on real tracks, not
zig-zag quarter tracks), in a standard
format. It assumes track $0D data is at
$1000, track $0E at $2000, &c. It's
slow because it's writing sectors in
increasing order, but this is not how
the bootloader will read them. (It will
read them much faster, as we'll see in
a minute.)

*F00L

0F00-   A9 70       LDA   #$70
0F02-   85 FF       STA   $FF
0F04-   A9 00       LDA   #$00
0F06-   85 FE       STA   $FE
0F08-   A9 0F       LDA   #$0F
0F0A-   A0 88       LDY   #$88
0F0C-   20 D9 03    JSR   $03D9
0F0F-   E6 FE       INC   $FE
0F11-   A4 FE       LDY   $FE
0F13-   C0 10       CPY   #$10
0F15-   D0 07       BNE   $0F1E
0F17-   A0 00       LDY   #$00
0F19-   84 FE       STY   $FE
0F1B-   EE 8C 0F    INC   $0F8C
0F1E-   98          TYA
0F1F-   8D 8D 0F    STA   $0F8D
0F22-   EE 91 0F    INC   $0F91
0F25-   C6 FF       DEC   $FF
0F27-   D0 DF       BNE   $0F08
0F29-   60          RTS

*F88.

0F88- 01 60 01 00 0D 00 FB F7
0F90- 00 10 00 00 02 00 00 60

In the interests of reproducibility
(read: because I messed it up the first
time and got tired of retyping all the
BLOAD commands), I make a BASIC program
that loads all the chunks of the game
code into the places where the writer
program looks for them, then calls the
writer at $0F00.

1 D$ =  CHR$ (4)
10 PRINT D$"BLOAD WRITER,A$F00"
20 PRINT D$"BLOAD OBJ.0F00-1EFF,A$1000"
30 PRINT D$"BLOAD OBJ.6000-8FFF,A$2000"
40 PRINT D$"BLOAD OBJ.0400-04FF,A$5000"
50 PRINT D$"BLOAD OBJ.9800-9FFF,A$5800"
60 PRINT D$"BLOAD OBJ.A000-BFFF,A$6000"
70 PRINT D$"BLOAD DOS33 RWTS,A$7800"
80 PRINT "INSERT LODE RUNNER (4AM CRACK
  ) DISK...": GET A$
100 CALL 3840: REM  $F00

]SAVE MAKE
]RUN
...wait for prompt...
...insert disk with T03-T0C level data
...write write write...

And just like that, I have a disk in a
standard format that contains all the
level data and all the game code.

Now, about that bootloader...

Once upon a time, I wrote a little
thing called 4boot. It was fast and
small and I was more than a little bit
proud of it. The boot1 code was a mere
742 bytes and fit in $BD00..$BFFF.

Then qkumba did that thing he does, and
now it fits in zero page.

With his blessing, I present: 0boot v3.

                   ~

               Chapter 8
                 0boot


0boot lives on track $00, just like me.
Sector $00 (boot0) reuses the disk
controller ROM routine to read sector
$0E (boot1). Boot0 creates a few data
tables, copys boot1 to zero page,
modifies it to accomodate booting from
any slot, and jumps to it.

Boot0 is loaded at $0800 by the disk
controller ROM routine.

; tell the ROM to load only this sector
; (we'll do the rest manually)
0800-  [01]

; The accumulator is $01 after loading
; sector $00, or $03 after loading
; sector $0E. We don't need to preserve
; the value, so we just shift the bits
; to determine whether this is the
; first or second time we've been here.
0801-   4A          LSR

; second run -- we've loaded boot1, so
; skip to boot1 initialization routine
0802-   D0 0D       BNE   $0811

; first run -- increment the physical
; sector to read (this will be the next
; sector under the drive head, so we'll
; waste as little time as possible
; waiting for the disk to spin)
0804-   E6 3D       INC   $3D

; X holds the boot slot (x16) --
; munge it into $Cx format (e.g. $C6
; for slot 6, but we need to accomodate
; booting from any slot)
0806-   8A          TXA
0807-   20 7B F8    JSR   $F87B
080A-   09 C0       ORA   #$C0

; push address (-1) of the sector read
; routine in the disk controller ROM
080C-   48          PHA
080D-   A9 5B       LDA   #$5B
080F-   48          PHA

; "return" via disk controller ROM,
; which reads boot1 into $0900 and
; exits via $0801
0810-   60          RTS

; Execution continues here (from $0802)
; after boot1 code has been loaded into
; $0900. On real Apple hardware, the Y
; register is always 0 at $0801, but it
; turns out the CFFA 3000 firmware does
; not always match this behavior --
; which is exactly the sort of bug that
; qkumba enjoys(*) uncovering -- so we
; initialize Y here (to 1, which is the
; value of the accumulator after the
; drive firmware loaded physical sector
; $03 and we performed an LSR).
0811-   A8          TAY

(*) not guaranteed, actual enjoyment
    may vary

; save boot slot in later code (by the
; time this code executes, we will have
; overwritten all of zero page, so the
; usual location of $2B won't have the
; boot slot information anymore)
0812-   8A          TXA
0813-   8D 9E 08    STA   $089E

; munge the boot slot, e.g. $60 -> $EC
; (to be used later)
0816-   09 8C       ORA   #$8C

; Copy the boot1 code from $0901..$09FF
; to zero page. ($0900 holds the 0boot
; version number. This is version 3.
; $0000 is initialized later in boot1.)
0818-   BE 00 09    LDX   $0900,Y
081B-   96 00       STX   $00,Y
081D-   C8          INY
081E-   D0 F8       BNE   $0818

; There are a number of places in boot1
; that need to hit a slot-specific soft
; switch (read a nibble from disk, turn
; off the drive, &c). Rather than the
; usual form of "LDA $C08C,X", we will
; use "LDA $C0EC" and modify the $EC
; byte in advance, based on the boot
; slot. $00E4 is an array of all the
; places in the boot1 code that need
; this adjustment.
0820-   C8          INY
0821-   B6 E4       LDX   $E4,Y
0823-   95 00       STA   $00,X
0825-   D0 F9       BNE   $0820

; munge $EC -> $E0 (used later to
; advance the drive head to the next
; track)
0827-   29 F0       AND   #$F0
0829-   85 CB       STA   $CB

; munge $E0 -> $E8 (used later to
; turn off the drive motor)
082B-   09 08       ORA   #$08
082D-   85 D3       STA   $D3

; push sector interleave array to the
; bottom of the stack (by setting the
; stack pointer to #$0F and pushing
; #$10 bytes, those bytes will end up
; in $0100..$010F)
082F-   A2 0F       LDX   #$0F
0831-   9A          TXS
0832-   BD B5 08    LDA   $08B5,X
0835-   48          PHA
0836-   CA          DEX
0837-   10 F9       BPL   $0832

For reference, this is the sector
interleave array:

08B5- .. .. .. .. .. 00 07 0E
08B8- 06 0D 05 0C 04 0B 03 0A
08C0- 02 09 01 08 0F

; push several addresses to the
; stack (more on this later)
0839-   A9 08       LDA   #$08
083B-   48          PHA
083C-   A9 9C       LDA   #$9C
083E-   48          PHA
083F-   A2 06       LDX   #$06
0841-   B5 DE       LDA   $DE,X
0843-   48          PHA
0844-   CA          DEX
0845-   D0 FA       BNE   $0841

; number of tracks to load (x2) (game-
; specific -- this game uses 7 tracks)
0847-   A0 0E       LDY   #$0E

; loop starts here
0849-   8A          TXA

; the carry was set by the "LSR" at
; $0801, so we won't take this branch
; the first time (but, as we will see
; shortly, the carry gets flipped off
; and on, and we end up taking this
; branch every second time through the
; loop)
084A-   90 0F       BCC   $085B

; check if we want to change the target
; address to store the track data
084C-   B9 C3 08    LDA   $08C3,Y

; 0 = no change (each track is stored
; in the next $1000 bytes in memory
; unless we change it)
084F-   F0 07       BEQ   $0858
0851-   48          PHA

; X is #$00 going into this loop, and
; it never changes, so now A is #$00.
0852-   8A          TXA

; Push $00DA to the stack, which (when
; we pop it from the stack later) will
; "return" to $00DB. That routine sets
; the target address to store the data
; from the next track we read.
0853-   48          PHA
0854-   A9 DA       LDA   #$DA
0856-   48          PHA

; Push $0000 to the stack to "return"
; to $0001, which reads a track into
; memory
0857-   8A          TXA

; Note that execution ends up here
; regardless -- either we fell through
; from $0857 or we branched from $084F.
; Either way, A is #$00.
0858-   48          PHA
0859-   48          PHA

; There's a "SEC" hidden here (because
; it's opcode $38), but it's only
; executed if we take the branch at
; $084A, which lands at $085B, which is
; in the middle of this instruction.
; Otherwise we execute the compare,
; which clears the carry bit because A
; is always #$00 at this point. So the
; carry flip-flops between set and
; clear, so the BCC at $084A is only
; taken every other time.
085A-   C9 38       CMP   #$38

; Push $00B6 to the stack, to "return"
; to $00B7. This routine advances the
; drive head to the next half track.
085C-   48          PHA
085D-   A9 B6       LDA   #$B6
085F-   48          PHA

; loop until done
0860-   88          DEY
0861-   D0 E6       BNE   $0849

Because of the carry flip-flop, we will
push $00B6 to the stack every time
through the loop, but we will only push
$0000 every other time. Occasionally we
also push a byte for the new target
address and the address of the routine
that changes it.

; push $00B6 (track advance) to the
; stack $18 times, because the game
; code actually starts on track $0D
; because tracks $03-$0C are level data
0863-   A0 18       LDY   #$18
0865-   8A          TXA
0866-   48          PHA
0867-   A9 B6       LDA   #$B6
0869-   48          PHA
086A-   88          DEY
086B-   D0 F8       BNE   $0865

The final stack looks like this:

 --top--
  $00B6 (move to track $00.5)
  $00B6 (move to track $01)

  .
  . [repeated]
  .

  $00B6 (move drive to track $0C.5)
  $00B6 (move drive to track $0D)
  $0000 (read track into $0F00..$1EFF)

  $00B6 \
  $00B6  \  move to track $0E,
  $00DA   } set target address
  $60    /  to $6000,
  $0000 /   and read entire track

  $00B6 \
  $00B6  } move to T0F, read into $7000
  $0000 /

  .
  . [repeated for each track]
  .

  $00B6 \
  $00B6  } move to T13, read into $B000
  $0000 /

  $FE88 (IN#0, pushed at $0841)
  $FE92 (PR#0, pushed at $0841)
  $00D1 (turn off drive motor)
  $089C (final setup, pushed at $0839)

  .
  . [unused]
  .

  $00070E060D050C040B030A020901080F
        (sector interleave table)
--bottom--

Boot1 reads the game into memory from
tracks $0D-$13, but it isn't a loop.
One routine advances the drive head,
another routine reads a track, and a
third routine changes the address to
store data in memory. We're essentially
unrolling the read loop on the stack in
advance. Each routine gets called when
we need it, as many times we need it.
Like dancers in a chorus line, each
routine does its dance then cedes the
spotlight. Each seems unaware of the
others, but in reality they've all
been meticulously choreographed.

                   ~

               Chapter 9
                 6 + 2


Before I can explain the next chunk of
code, I need to pause and explain a
little bit of theory. As you probably
know if you're the sort of person who
reads this sort of thing, Apple II
floppy disks do not contain the actual
data that ends up being loaded into
memory. Due to hardware limitations of
the original Disk II drive, data on
disk must be stored in an intermediate
format called "nibbles." Bytes in
memory are encoded into nibbles before
writing to disk, and nibbles that you
read from the disk must be decoded back
into bytes. The round trip is lossless
but requires some bit wrangling.

Decoding nibbles-on-disk into bytes-in-
memory is a multi-step process. In
"6-and-2 encoding" (used by DOS 3.3,
ProDOS, and all ".dsk" image files),
there are 64 possible values that you
may find in the data field (in the
range $96..$FF, but not all of those,
because some of them have bit patterns
that trip up the drive firmware). We'll
call these "raw nibbles."

Step 1: read $156 raw nibbles from the
data field. These values will range
from $96 to $FF, but as mentioned
earlier, not all values in that range
will appear on disk.

Now we have $156 raw nibbles.

Step 2: decode each of the raw nibbles
into a 6-bit byte between 0 and 63
(%00000000 and %00111111 in binary).
$96 is the lowest valid raw nibble, so
it gets decoded to 0. $97 is the next
valid raw nibble, so it's decoded to 1.
$98 and $99 are invalid, so we skip
them, and $9A gets decoded to 2. And so
on, up to $FF (the highest valid raw
nibble), which gets decoded to 63.

Now we have $156 6-bit bytes.

Step 3: split up each of the first $56
6-bit bytes into pairs of bits. In
other words, each 6-bit byte becomes
three 2-bit bytes. These 2-bit bytes
are merged with the next $100 6-bit
bytes to create $100 8-bit bytes. Hence
the name, "6-and-2" encoding.

The exact process of how the bits are
split and merged is... complicated. The
first $56 6-bit bytes get split up into
2-bit bytes, but those two bits get
swapped (so %01 becomes %10 and vice-
versa). The other $100 6-bit bytes each
get multiplied by 4 (a.k.a. bit-shifted
two places left). This leaves a hole in
the lower two bits, which is filled by
one of the 2-bit bytes from the first
group.

A diagram might help. "a" through "x"
each represent one bit.

             -------------

1 decoded      3 decoded
nibble in  +   nibbles in   =  3 bytes
first $56      other $100


00abcdef       00ghijkl
               00mnopqr
   |           00stuvwx
   |
 split            |
   &           shifted
swapped        left x2
   |              |
   V              V

000000fe   +   ghijkl00   =   ghijklfe
000000dc   +   mnopqr00   =   mnopqrdc
000000ba   +   stuvwx00   =   stuvwxba

             -------------

Tada! Four 6-bit bytes

  00abcdef
  00ghijkl
  00mnopqr
  00stuvwx

become three 8-bit bytes

  ghijklfe
  mnopqrdc
  stuvwxba

When DOS 3.3 reads a sector, it reads
the first $56 raw nibbles, decoded them
into 6-bit bytes, and stashes them in a
temporary buffer (at $BC00). Then it
reads the other $100 raw nibbles,
decodes them into 6-bit bytes, and puts
them in another temporary buffer (at
$BB00). Only then does DOS 3.3 start
combining the bits from each group to
create the full 8-bit bytes that will
end up in the target page in memory.
This is why DOS 3.3 "misses" sectors
when it's reading, because it's busy
twiddling bits while the disk is still
spinning.

                   ~

              Chapter 10
             Shift Happens


0boot also uses "6-and-2" encoding. The
first $56 nibbles in the data field are
still split into pairs of bits that
need to be merged with nibbles that
won't come until later. But instead of
waiting for all $156 raw nibbles to be
read from disk, it "interleaves" the
nibble reads with the bit twiddling
required to merge the first $56 6-bit
bytes and the $100 that follow. By the
time 0boot gets to the data field
checksum, it has already stored all
$100 8-bit bytes in their final resting
place in memory. This means that 0boot
can read all 16 sectors on a track in
one revolution of the disk. That's
crazy fast.

To make it possible to do all the bit
twiddling we need to do and not miss
nibbles as the disk spins(*), we do
some of the work earlier. We multiply
each of the 64 possible decoded values
by 4 and store those values. (Since
this is accomplished by bit shifting
and we're doing it before we start
reading the disk, this is called the
"pre-shift" table.) We also store all
possible 2-bit values in a repeating
pattern that will make it easy to look
them up later. Then, as we're reading
from disk (and timing is tight), we can
simulate all the bit math we need to do
with a series of table lookups. There
is just enough time to convert each raw
nibble into its final 8-bit byte before
reading the next nibble.

(*) The disk spins independently of the
    CPU, and we only have a limited
    time to read a nibble and do what
    we're going to do with it before
    WHOOPS HERE COMES ANOTHER ONE. So
    time is of the essence. Also, "As
    The Disk Spins" would make a great
    name for a retrocomputing-themed
    soap opera.

The first table, at $0200..$02FF, is
three columns wide and 64 rows deep.
Astute readers will notice that 3 x 64
is not 256. Only three of the columns
are used; the fourth (unused) column
exists because multiplying by 3 is hard
but multiplying by 4 is easy (in base 2
anyway). The three columns correspond
to the three pairs of 2-bit values in
those first $56 6-bit bytes. Since the
values are only 2 bits wide, each
column holds one of four different
values (%00, %01, %10, or %11).

The second table, at $036C..$03D5, is
the "pre-shift" table. This contains
all the possible 6-bit bytes, in order,
each multiplied by 4 (a.k.a. shifted to
the left two places, so the 6 bits that
started in columns 0-5 are now in
columns 2-7, and columns 0 and 1 are
zeroes). Like this:

       00ghijkl   -->   ghijkl00

Astute readers will notice that there
are only 64 possible 6-bit bytes, but
this second table is larger than 64
bytes. To make lookups easier, the
table has empty slots for each of the
invalid raw nibbles. In other words, we
don't do any math to decode raw nibbles
into 6-bit bytes; we just look them up
in this table (offset by $96, since
that's the lowest valid raw nibble) and
get the required bit shifting for free.


addr | raw |  decoded 6-bit | pre-shift
-----+-----+----------------+----------
$36C | $96 |  0 = %00000000 | %00000000
$36D | $97 |  1 = %00000001 | %00000100
$36E | $98        [invalid raw nibble]
$36F | $99        [invalid raw nibble]
$370 | $9A |  2 = %00000010 | %00001000
$371 | $9B |  3 = %00000011 | %00001100
$372 | $9C        [invalid raw nibble]
$373 | $9D |  4 = %00000100 | %00010000
  .
  .
  .
$3D4 | $FE | 62 = %00111110 | %11111000
$3D5 | $FF | 63 = %00111111 | %11111100


Each value in this "pre-shift" table
also serves as an index into the first
table (with all the 2-bit bytes). This
wasn't an accident; I mean, that sort
of magic doesn't just happen. But the
table of 2-bit bytes is arranged in
such a way that we take one of the raw
nibbles that needs to be decoded and
split apart (from the first $56 raw
nibbles in the data field), use that
raw nibble as an index into the pre-
shift table, then use that pre-shifted
value as an index into the first table
to get the 2-bit value we need. That's
a neat trick.

; this loop creates the pre-shift table
; at $36C
086D-   A2 6A       LDX   #$6A
086F-   1E 6B 03    ASL   $036B,X
0872-   1E 6B 03    ASL   $036B,X
0875-   CA          DEX
0876-   D0 F7       BNE   $086F

Wait, what?

It turns out the drive firmware already
creates a table that looks very similar
to the pre-shift table we want... it's
just not shifted yet! Since we're not
calling the drive firmware anymore, we
can take full advantage of this table
that's guaranteed to be in memory.

And this is the result (".." means the
address is unused):

036C-             00 04 .. ..
0370- 08 0C .. 10 14 18 .. ..
0378- .. .. .. .. 1C 20 .. ..
0380- .. 24 28 2C 30 34 .. ..
0388- 38 3C 40 44 48 4C .. 50
0390- 54 58 5C 60 64 68 .. ..
0398- .. .. .. .. .. .. .. ..
03A0- .. 6C .. 70 74 78 .. ..
03A8- .. 7C .. .. 80 84 .. 88
03B0- 8C 90 94 98 9C A0 .. ..
03B8- .. .. .. A4 A8 AC .. B0
03C0- B4 B8 BC C0 C4 C8 .. ..
03C8- CC D0 D4 D8 DC E0 .. E4
03D0- E8 EC F0 F4 F8 FC

; this loop creates the table of 2-bit
; values at $200, magically arranged to
; enable easy lookups later
0878-   C8          INY
0879-   46 BA       LSR   $BA
087B-   46 BA       LSR   $BA
087D-   B5 EB       LDA   $EB,X
087F-   99 FF 01    STA   $01FF,Y
0882-   E6 AF       INC   $AF
0884-   A5 AF       LDA   $AF
0886-   25 BA       AND   $BA
0888-   D0 05       BNE   $088F
088A-   E8          INX
088B-   8A          TXA
088C-   29 03       AND   #$03
088E-   AA          TAX
088F-   C8          INY
0890-   C8          INY
0891-   C8          INY
0892-   C8          INY
0893-   C0 04       CPY   #$04
0895-   B0 E6       BCS   $087D
0897-   C8          INY
0898-   C0 04       CPY   #$04
089A-   90 DD       BCC   $0879

And this is the result:

0200- 00 00 00 .. 00 00 02 ..
0208- 00 00 01 .. 00 00 03 ..
0210- 00 02 00 .. 00 02 02 ..
0218- 00 02 01 .. 00 02 03 ..
0220- 00 01 00 .. 00 01 02 ..
0228- 00 01 01 .. 00 01 03 ..
0230- 00 03 00 .. 00 03 02 ..
0238- 00 03 01 .. 00 03 03 ..
0240- 02 00 00 .. 02 00 02 ..
0248- 02 00 01 .. 02 00 03 ..
0250- 02 02 00 .. 02 02 02 ..
0258- 02 02 01 .. 02 02 03 ..
0260- 02 01 00 .. 02 01 02 ..
0268- 02 01 01 .. 02 01 03 ..
0270- 02 03 00 .. 02 03 02 ..
0278- 02 03 01 .. 02 03 03 ..
0280- 01 00 00 .. 01 00 02 ..
0288- 01 00 01 .. 01 00 03 ..
0290- 01 02 00 .. 01 02 02 ..
0298- 01 02 01 .. 01 02 03 ..
02A0- 01 01 00 .. 01 01 02 ..
02A8- 01 01 01 .. 01 01 03 ..
02B0- 01 03 00 .. 01 03 02 ..
02B8- 01 03 01 .. 01 03 03 ..
02C0- 03 00 00 .. 03 00 02 ..
02C8- 03 00 01 .. 03 00 03 ..
02D0- 03 02 00 .. 03 02 02 ..
02D8- 03 02 01 .. 03 02 03 ..
02E0- 03 01 00 .. 03 01 02 ..
02E8- 03 01 01 .. 03 01 03 ..
02F0- 03 03 00 .. 03 03 02 ..
02F8- 03 03 01 .. 03 03 03 ..

; And that's all she wrote. Everything
; else is already lined up on the
; stack. All that's left to do is
; "return" and let the stack guide us
; through the rest of the boot.
089C-   60          RTS

[Note to future self: $089C..$08FF is
 available for game-specific init code,
 but it can't rely on or disturb zero
 page in any way. That rules out a lot
 of built-in ROM routines; be careful.]

                   ~

              Chapter 11
              0boot boot1


The rest of the boot runs from zero
page. It's hard to show you exactly
what boot1 will look like, because it
relies heavily on self-modifying code.

In a standard DOS 3.3 RWTS, the
softswitch to read the data latch is
"LDA $C08C,X", where X is the boot slot
times 16 (to allow disks to boot from
any slot). 0boot also supports booting
from any slot, but instead of using an
index, each fetch instruction is pre-
set based on the boot slot. Not only
does this free up the X register, it
lets us juggle all the registers and
put the raw nibble value in whichever
one is convenient at the time. (We take
full advantage of this freedom.) I've
marked each pre-set softswitch with
"o_O" to remind you that self-modifying
code is awesome.

There are several other instances of
addresses and constants that get
modified while boot1 is running. I've
marked these with "/!\" to remind you
that self-modifying code is dangerous
and you should not try this at home.

The first thing popped off the stack is
the drive arm move routine at $00B7. It
moves the drive exactly one phase (half
a track).

00B7-   E6 BA       INC   $BA

; This value was set at $00B7 (above).
; It's incremented monotonically, but
; it's ANDed with $03 later, so its
; exact value isn't relevant.
00B9-   A0 3F       LDY   #$00      /!\

; short wait for PHASEON
00BB-   A9 04       LDA   #$04
00BD-   20 C3 00    JSR   $00C3

; fall through
00C0-   88          DEY

; longer wait for PHASEOFF
00C1-   69 41       ADC   #$41
00C3-   85 CE       STA   $CE

; calculate the proper stepper motor to
; access
00C5-   98          TYA
00C6-   29 03       AND   #$03
00C8-   2A          ROL
00C9-   AA          TAX

; This address was set at $0827,
; based on the boot slot.
00CA-   BD E0 C0    LDA   $C0E0,X   /!\

; This value was set at $00C3 so that
; PHASEON and PHASEOFF have optimal
; wait times.
00CD-   A9 D1       LDA   #$D1      /!\

; wait exactly the right amount of time
; after accessing the proper stepper
; motor
00CF-   4C A8 FC    JMP   $FCA8

Since the drive arm routine only moves
one phase, it was pushed to the stack
twice before each track read. Our game
is stored on whole tracks; this half-
track trickery is only to save a few
bytes of code in boot1. (Hey, we're on
zero page; space is tight!)

The track read routine starts at $0001,
because that let us save 1 byte in the
boot0 code when we were pushing
addresses to the stack. (We could just
push $00 twice.)

; sectors-left-to-read-on-this-track
; counter (incremented to $00)
0001-   A2 F0       LDX   #$F0
0003-   86 00       STX   $00

We initialize an array at $00DF that
tracks which sectors we've read from
the current track. Astute readers will
notice that this part of zero page had
real data in it -- some addresses that
were pushed to the stack, and some
other values that were used to create
the 2-bit table at $0200. All true, but
all those operations are now complete,
and the space is now available for
unrelated uses.

The array is in logical sector order;
we convert physical to logical sectors
immediately after reading the address
field. Values are the actual pages in
memory where that sector should go, and
they get zeroed once the sector is read
(so we don't waste time decoding the
same sector twice).

; starting address (game-specific;
; this one starts loading at $0F00)
0005-   A9 0F       LDA   #$0F      /!\
0007-   95 EF       STA   $EF,X
0009-   E6 06       INC   $06
000B-   E8          INX
000C-   D0 F7       BNE   $0005

000E-   20 D5 00    JSR   $00D5

; subroutine reads a nibble and
; stores it in the accumulator
00D5-   AD EC C0    LDA   $C0EC     o_O
00D8-   10 FB       BPL   $00D5
00DA-   60          RTS

Continuing from $0011...

; first nibble must be $D5
0011-   C9 D5       CMP   #$D5
0013-   D0 F9       BNE   $000E

; read second nibble, must be $AA
0015-   20 D5 00    JSR   $00D5
0018-   C9 AA       CMP   #$AA
001A-   D0 F5       BNE   $0011

; We actually need the Y register to be
; $AA for unrelated reasons later, so
; let's set that now. (We have time,
; and it saves 1 byte!)
001C-   A8          TAY

; read the third nibble
001D-   20 D5 00    JSR   $00D5

; is it $AD?
0020-   49 AD       EOR   #$AD

; Yes, which means this is the data
; prologue. Branch forward to start
; reading the data field.
0022-   F0 22       BEQ   $0046

If that third nibble is not $AD, we
assume it's the end of the address
prologue. ($96 would be the third
nibble of a standard address prologue,
but we don't actually check.) We fall
through and start decoding the 4-4
encoded values in the address field.

0024-   A0 02       LDY   #$02

The first time through this loop,
we'll read the disk volume number.
The second time, we'll read the track
number. The third time, we'll read
the physical sector number. We don't
actually care about the disk volume or
the track number, and once we get the
sector number, we don't verify the
address field checksum.

0026-   20 D5 00    JSR   $00D5
0029-   2A          ROL
002A-   85 AF       STA   $AF
002C-   20 D5 00    JSR   $00D5
002F-   25 AF       AND   $AF
0031-   88          DEY
0032-   10 F2       BPL   $0026

; take physical sector number (in A)
; and use it to look up the logical
; sector number
0034-   AA          TAX
0035-   BC 00 01    LDY   $0100,X

; store logical sector number
0038-   84 AF       STY   $AF

; use logical sector number as an
; index into the sector address array
; to get the target page (where we want
; to store this sector in memory)
003A-   B6 DF       LDX   $DF,Y

; store the target page in several
; places throughout the following code
003C-   86 9E       STX   $9E
003E-   CA          DEX
003F-   86 6E       STX   $6E
0041-   86 86       STX   $86
0043-   E8          INX

; This is an unconditional branch,
; because the ROL at $0029 will always
; set the carry. We're done processing
; the address field, so we need to loop
; back and wait for the data prologue.
0044-   B0 C8       BCS   $000E

; execution continues here (from $0022)
; after matching the data prologue
0046-   E0 00       CPX   #$00

; If X is still $00, it means we found
; a data prologue before we found an
; address prologue. In that case, we
; have to skip this sector, because we
; don't know which sector it is and we
; wouldn't know where to put it.
0048-   F0 C4       BEQ   $000E

Nibble loop #1 reads nibbles $00..$55,
looks up the corresponding offset in
the preshift table at $036C, and stores
that offset in the temporary buffer at
$0300.

; initialize rolling checksum to $00
004A-   85 58       STA   $58
004C-   AE EC C0    LDX   $C0EC     o_O
004F-   10 FB       BPL   $004C

; The nibble value is in the X register
; now. The lowest possible nibble value
; is $96 and the highest is $FF. To
; look up the offset in the table at
; $036C, we need to subtract $96 from
; $036C and add X.
0051-   BD D6 02    LDA   $02D6,X

; Now the accumulator has the offset
; into the table of individual 2-bit
; combinations ($0200..$02FF). Store
; that offset in the temporary buffer
; at $0300, in the order we read the
; nibbles. But the Y register started
; counting at $AA, so we need to
; subtract $AA from $0300 and add Y.
0054-   99 56 02    STA   $0256,Y

; The EOR value is set at $004A
; each time through loop #1.
0057-   49 00       EOR   #$00      /!\
0059-   C8          INY
005A-   D0 EE       BNE   $004A

Here endeth nibble loop #1.

Nibble loop #2 reads nibbles $56..$AB,
combines them with bits 0-1 of the
appropriate nibble from the first $56,
and stores them in bytes $00..$55 of
the target page in memory.

005C-   A0 AA       LDY   #$AA
005E-   AE EC C0    LDX   $C0EC     o_O
0061-   10 FB       BPL   $005E
0063-   5D D6 02    EOR   $02D6,X
0066-   BE 56 02    LDX   $0256,Y
0069-   5D 02 02    EOR   $0202,X

; This address was set at $003F
; based on the target page (minus 1
; so we can add Y from $AA..$FF).
006C-   99 56 D1    STA   $D156,Y   /!\
006F-   C8          INY
0070-   D0 EC       BNE   $005E

Here endeth nibble loop #2.

Nibble loop #3 reads nibbles $AC..$101,
combines them with bits 2-3 of the
appropriate nibble from the first $56,
and stores them in bytes $56..$AB of
the target page in memory.

0072-   29 FC       AND   #$FC
0074-   A0 AA       LDY   #$AA
0076-   AE EC C0    LDX   $C0EC     o_O
0079-   10 FB       BPL   $0076
007B-   5D D6 02    EOR   $02D6,X
007E-   BE 56 02    LDX   $0256,Y
0081-   5D 01 02    EOR   $0201,X

; This address was set at $0041
; based on the target page (minus 1
; so we can add Y from $AA..$FF).
0084-   99 AC D1    STA   $D1AC,Y   /!\
0087-   C8          INY
0088-   D0 EC       BNE   $0076

Here endeth nibble loop #3.

Loop #4 reads nibbles $102..$155,
combines them with bits 4-5 of the
appropriate nibble from the first $56,
and stores them in bytes $AC..$FF of
the target page in memory.

008A-   29 FC       AND   #$FC
008C-   A2 AC       LDX   #$AC
008E-   AC EC C0    LDY   $C0EC     o_O
0091-   10 FB       BPL   $008E
0093-   59 D6 02    EOR   $02D6,Y
0096-   BC 54 02    LDY   $0254,X
0099-   59 00 02    EOR   $0200,Y

; This address was set at $003C
; based on the target page.
009C-   9D 00 D1    STA   $D100,X   /!\
009F-   E8          INX
00A0-   D0 EC       BNE   $008E

Here endeth nibble loop #4.

; Finally, get the last nibble,
; which is the checksum of all
; the previous nibbles.
00A2-   29 FC       AND   #$FC
00A4-   AC EC C0    LDY   $C0EC     o_O
00A7-   10 FB       BPL   $00A4
00A9-   59 D6 02    EOR   $02D6,Y

; If checksum fails, start over.
; Note: we really want to branch
; to $000E, but that's too far,
; so we're branching to an earlier
; unrelated "BCS" which branches
; to $000E. The carry is always
; set at this point (it was set
; by the "CPX #$00" all the way
; back at $0046), so the BCS is
; an unconditional jump and we
; end up where we want (at $000E).
00AC-   D0 96       BNE   $0044

; This was set to the logical
; sector number (at $0038), so
; this is a index into the 16-
; byte array at $00DF.
00AE-   A0 00       LDY   #$00      /!\

; store #$00 at this index in the
; sector array to indicate that
; we've read this sector
00B0-   96 DF       STX   $DF,Y

; are we done yet?
00B2-   E6 00       INC   $00

; nope, loop back to read more sectors
00B4-   D0 8E       BNE   $0044

; And that's all she read.
00B6-   60          RTS

0boot's track read routine is done when
$0000 hits $00, which is astonishingly
beautiful. Like, "now I know God" level
of beauty.

And so it goes: we pop another address
off the stack, move the drive arm, read
another track, and so on. Eventually we
finish moving and reading, moving and
reading, and we get to the home stretch
and start calling ROM routines.

  $FE88 (IN#0, pushed at $0841)
  $FE92 (PR#0, pushed at $0841)

Next on the stack:

  $00D1 (turn off drive motor)

00D2-   AD E8 C0    LDA   $C0E8     /!\

Note that this routine falls through to
the one at $00D5 which reads a nibble
from disk, but that's harmless.

And the last thing on the stack:

  $089C (final setup, pushed at $0839)

...which jumps to $089D, which was part
of track 0, sector 0.

; this boot slot was modified
; earlier (at $0813)
089D-   A9 60       LDA   #$60      /!\

; set up the second-stage RWTS
; (the original disk did this at
; $0441 in the sector that it read
; at the last minute, but I can't
; jump to it directly because it's
; intertwingled with a call to the
; original RWTS on the text page,
; which doesn't exist)
089F-   8D E9 B7    STA   $B7E9
08A2-   8D F7 B7    STA   $B7F7
08A5-   4A          LSR
08A6-   4A          LSR
08A7-   4A          LSR
08A8-   4A          LSR
08A9-   AA          TAX

; tell second-stage RWTS that we're
; on track $13, so it doesn't grind
; the disk when it loads level data
08AA-   A9 13       LDA   #$13
08AC-   9D 78 04    STA   $0478,X
08AF-   9D F8 04    STA   $04F8,X

; jump to the game code (originally
; at $0450) to initialize zero page
; and start the game
08B2-   4C 50 90    JMP   $9050

The entire boot process takes about two
seconds -- a 5x speed increase from the
original disk. Copy protection is
expensive.

Quod erat liberandum.

                   ~

           Acknowledgements


Thanks to qkumba for writing 0boot, for
explaining 6-and-2 encoding to me, for
reviewing drafts of this write-up, and
for being that rare combination of
smart and kind. (Kids, don't put up
with genius jerks. There are genius
non-jerks. Find them. Nurture them.
Cherish them.)

                   ~

               Changelog


2020-06-24

- typo in the 6-and-2 encoding diagram
  [thanks Andrew R.]

2018-10-20

- initial release

---------------------------------------
A 4am crack                    No. 1899
------------------EOF------------------
